26 research outputs found
DRSP : Dimension Reduction For Similarity Matching And Pruning Of Time Series Data Streams
Similarity matching and join of time series data streams has gained a lot of
relevance in today's world that has large streaming data. This process finds
wide scale application in the areas of location tracking, sensor networks,
object positioning and monitoring to name a few. However, as the size of the
data stream increases, the cost involved to retain all the data in order to aid
the process of similarity matching also increases. We develop a novel framework
to addresses the following objectives. Firstly, Dimension reduction is
performed in the preprocessing stage, where large stream data is segmented and
reduced into a compact representation such that it retains all the crucial
information by a technique called Multi-level Segment Means (MSM). This reduces
the space complexity associated with the storage of large time-series data
streams. Secondly, it incorporates effective Similarity Matching technique to
analyze if the new data objects are symmetric to the existing data stream. And
finally, the Pruning Technique that filters out the pseudo data object pairs
and join only the relevant pairs. The computational cost for MSM is O(l*ni) and
the cost for pruning is O(DRF*wsize*d), where DRF is the Dimension Reduction
Factor. We have performed exhaustive experimental trials to show that the
proposed framework is both efficient and competent in comparison with earlier
works.Comment: 20 pages,8 figures, 6 Table
Bidirectional Growth based Mining and Cyclic Behaviour Analysis of Web Sequential Patterns
Web sequential patterns are important for analyzing and understanding users
behaviour to improve the quality of service offered by the World Wide Web. Web
Prefetching is one such technique that utilizes prefetching rules derived
through Cyclic Model Analysis of the mined Web sequential patterns. The more
accurate the prediction and more satisfying the results of prefetching if we
use a highly efficient and scalable mining technique such as the Bidirectional
Growth based Directed Acyclic Graph. In this paper, we propose a novel
algorithm called Bidirectional Growth based mining Cyclic behavior Analysis of
web sequential Patterns (BGCAP) that effectively combines these strategies to
generate prefetching rules in the form of 2-sequence patterns with Periodicity
and threshold of Cyclic Behaviour that can be utilized to effectively prefetch
Web pages, thus reducing the users perceived latency. As BGCAP is based on
Bidirectional pattern growth, it performs only (log n+1) levels of recursion
for mining n Web sequential patterns. Our experimental results show that
prefetching rules generated using BGCAP is 5-10 percent faster for different
data sizes and 10-15% faster for a fixed data size than TD-Mine. In addition,
BGCAP generates about 5-15 percent more prefetching rules than TD-Mine.Comment: 19 page
Forecasting Stock Time-Series using Data Approximation and Pattern Sequence Similarity
Time series analysis is the process of building a model using statistical
techniques to represent characteristics of time series data. Processing and
forecasting huge time series data is a challenging task. This paper presents
Approximation and Prediction of Stock Time-series data (APST), which is a two
step approach to predict the direction of change of stock price indices. First,
performs data approximation by using the technique called Multilevel Segment
Mean (MSM). In second phase, prediction is performed for the approximated data
using Euclidian distance and Nearest-Neighbour technique. The computational
cost of data approximation is O(n ni) and computational cost of prediction task
is O(m |NN|). Thus, the accuracy and the time required for prediction in the
proposed method is comparatively efficient than the existing Label Based
Forecasting (LBF) method [1].Comment: 11 page
Resource boxing: Converting realistic cloud task utilization patterns for theoretical scheduling
Scheduling is a core component within distributed systems to determine optimal allocation of tasks within servers. This is challenging within modern Cloud computing systems - comprising millions of tasks executing in thousands of heterogeneous servers. Theoretical scheduling is capable of providing complete and sophisticated algorithms towards a single objective function. However, Cloud computing systems pursue multiple and oftentimes conflicting objectives towards provisioning high levels of performance, availability, reliability and energy-efficiency. As a result, theoretical scheduling for Cloud computing is performed by simplifying assumptions for applicability. This is especially true for task utilization patterns, which fluctuate in practice yet are modelled as piecewise constant in theoretical scheduling models. While there exists work for modelling dynamic Cloud task patterns for evaluating applied scheduling, such models are incompatible with the inputs needed for theoretical scheduling - which require such patterns to be represented as boxes. Presently there exist no methods capable of accurately converting real task patterns derived from empirical data into boxes. This results in a significant gap towards theoreticians understanding and proposing algorithms derived from realistic assumptions towards enhanced Cloud scheduling. This work proposes resource boxing - an approach for automated conversion of realistic task patterns in Cloud computing directly into box-inputs for theoretical scheduling. We propose four resource conversion algorithms capable of accurately representing real task utilization patterns in the form of scheduling boxes. Algorithms were evaluated using production Cloud trace data, demonstrating a difference between real utilization and scheduling boxes less than 5%. We also provide an application for how resource boxing can be exploited to directly translate research from the applied community into the theoretical community